-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ssh tunnel support for jdbc connectors, issue #312 #5438
Add ssh tunnel support for jdbc connectors, issue #312 #5438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tunneling for discovering the catalog for jdbc connectors also going to be supported as part of this PR?
I'd like to take another look once the closing pieces are finished.
@@ -5,6 +5,12 @@ plugins { | |||
|
|||
dependencies { | |||
implementation 'commons-cli:commons-cli:1.4' | |||
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0' | |
implementation 'org.apache.sshd:sshd-mina:2.7.0' |
nit: we generally use this format
} | ||
|
||
/** | ||
* Extension point for child classes to add things to the spec json tree before it's deserialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Extension point for child classes to add things to the spec json tree before it's deserialized | |
* Extension point for child classes to modify the spec json tree before it's deserialized |
* @param root | ||
* @return root | ||
*/ | ||
public JsonNode addToSpec(JsonNode root) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public JsonNode addToSpec(JsonNode root) throws IOException { | |
public JsonNode transformSpec(JsonNode root) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it makes sense to not expose this method in BaseConnector
and instead just doing it on AbstractJdbcDestination
and AbstractJdbcSource
? I know it means that we will have to override spec()
in those respective classes, so it probably a tiny bit more code change. On the plus side though, I think this transformation is going to be pretty specific to our JDBC sources and destinations, so exposing this transform just in those abstractions as opposed to all of our other connectors keeps the base a bit simpler.
@@ -42,7 +44,18 @@ | |||
public ConnectorSpecification spec() throws Exception { | |||
// return a JsonSchema representation of the spec for the integration. | |||
final String resourceString = MoreResources.readResource("spec.json"); | |||
return Jsons.deserialize(resourceString, ConnectorSpecification.class); | |||
return Jsons.object(addToSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Jsons.object(addToSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class); | |
return Jsons.object(transformSpec(Jsons.deserialize(resourceString)), ConnectorSpecification.class); |
String remoteDatabasePort, | ||
String tunnelDatabasePort) { | ||
if (method == null) { | ||
this.method = "NO_TUNNEL"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should us a constant for NO_TUNNEL
since it's used in a couple places
this.tunnelDatabasePort = tunnelDatabasePort; | ||
} | ||
|
||
public static SSHTunnel getInstance(JsonNode config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the code is a bit confusing for the case where the SSHTunnel
doesn't have a valid configuration.
Maybe it'd be easier to read if this was Optional<SSHTunnel> getInstance(...)
? Then the calls to it could be done with tunnel.ifPresent(...)
which might read a bit better to show that this is an optional feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead explicitly validate which case we're in (no tunnel/ssh key/pw auth) and do null checks/validation based on that?
/** | ||
* Closes a tunnel if one was open, and otherwise doesn't do anything (safe to run). | ||
*/ | ||
public void closeTunnel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could actually be close()
and the class could be AutoClosable
so we could do this with a try-with-resources block instead of a try-finally block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes LGTM - should tests go in a separate PR or this one?
this.tunnelDatabasePort = tunnelDatabasePort; | ||
} | ||
|
||
public static SSHTunnel getInstance(JsonNode config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead explicitly validate which case we're in (no tunnel/ssh key/pw auth) and do null checks/validation based on that?
* Encapsulates the connection configuration for an ssh tunnel port forward through a proxy/bastion | ||
* host plus the remote host and remote port to forward to a specified local port. | ||
*/ | ||
public class SSHTunnel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to extend autocloseable so we can use a try-with-resources?
"tunnel_method": { | ||
"description": "No ssh tunnel needed to connect to database", | ||
"type": "string", | ||
"enum": ["NO_TUNNEL"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use const: "NO_TUNNEL"
instead of enum
/default
like described here? enum
/default
shows two dropdowns to the user which is confusing.
Separately it might be better for the UI to name all these options without undesrcores e.g No Tunnel
, SSH Key Auth
, SSH Password Auth
etc.. since it's more user friendly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only doing comment because i think we are still waiting on integration tests.
@@ -11,6 +11,7 @@ application { | |||
dependencies { | |||
implementation 'com.google.cloud:google-cloud-storage:1.113.16' | |||
implementation 'com.google.auth:google-auth-library-oauth2-http:0.25.5' | |||
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0' | |
implementation 'org.apache.sshd:sshd-mina:2.7.0' |
@@ -0,0 +1,167 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file the same as the one in airbyte-integrations/connectors/destinations-relational-db/src/main/resources/ssh-tunnel-spec.json
? Is there any way we can DRY it out?
* @param root | ||
* @return root | ||
*/ | ||
public JsonNode addToSpec(JsonNode root) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it makes sense to not expose this method in BaseConnector
and instead just doing it on AbstractJdbcDestination
and AbstractJdbcSource
? I know it means that we will have to override spec()
in those respective classes, so it probably a tiny bit more code change. On the plus side though, I think this transformation is going to be pretty specific to our JDBC sources and destinations, so exposing this transform just in those abstractions as opposed to all of our other connectors keeps the base a bit simpler.
if (getHost() == null) { | ||
throw new RuntimeException("SSH Tunnel host is null - verify configuration before starting tunnel!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (getHost() == null) { | |
throw new RuntimeException("SSH Tunnel host is null - verify configuration before starting tunnel!"); | |
} | |
Preconditions.notNull(getHost(), "SSH Tunnel host is null - verify configuration before starting tunnel!") |
we tend to use Preconditions for validations like this. 1. theoretically takes fewer lines of code (though that's not always true) 2. gives you a more specific error message (in this case NPE)
Integer.parseInt( | ||
getTunnelSshPort().trim())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integer.parseInt( | |
getTunnelSshPort().trim())) | |
Integer.parseInt(getTunnelSshPort().trim())) |
public String getMethod() { | ||
return method; | ||
} | ||
|
||
public String getHost() { | ||
return host; | ||
} | ||
|
||
public String getTunnelSshPort() { | ||
return tunnelSshPort; | ||
} | ||
|
||
public String getUser() { | ||
return user; | ||
} | ||
|
||
private String getSSHKey() { | ||
return sshkey; | ||
} | ||
|
||
private String getPassword() { | ||
return password; | ||
} | ||
|
||
public String getRemoteDatabaseHost() { | ||
return remoteDatabaseHost; | ||
} | ||
|
||
public String getRemoteDatabasePort() { | ||
return remoteDatabasePort; | ||
} | ||
|
||
public String getTunnelDatabasePort() { | ||
return tunnelDatabasePort; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the benefit of exposing all of these getters publicly? should they at least be private? or since these variables are only used internally, can we just reference the fields directly?
Existing comments on the PR are probably mostly still relevant; I haven't had a chance to address them. Work is being paused due to timelines for a different feature. This one encountered a lot of complexity around the Source's read() method hierarchy, which is making testing more complex. What still needs to be done includes identifying the root cause of the test failures for read (either no tunnel starts when needed, or tunnel tries to start twice and port-conflicts itself), get test case for postgres passing, and then go back and address the getSpec() method used for all jdbc connectors in tests, because they're assuming they don't have the ssh tunnel method spec.json nodes in there (but those are added at runtime, and thus the test must accommodate that). Right now that has to be done with duplicated code, because we also haven't addressed the structural issues with test class inheritance that would improve DRY for that. Basically there are a bunch of structural issues that should be addressed by refactors, but I don't have the time right now to do them. These are notes for when I come back later to try to figure it out further. To be clear: I expect that destination tests and source tests for non-postgres jdbc connectors are currently failing on testGetSpec() because the static json file they're comparing against doesn't have ssh tunnel configuration in it. I believe source changes and destination changes could be separated into different PRs, but we still have to address the testGetSpec() issue even for destinations. |
* but for now that API spec doesn't support that, so we'll do it here. | ||
* @throws Exception | ||
*/ | ||
void dropUniqueSchema() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to use and ssh tunnel to interact with the database for setting up test fixtures? aren't we just making life harder for ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. because the test database we are using is only accessible via ssh.
@airbyte-jenny how were you testing this locally? was there a special docker image for postgres that you were using that had the bastion onboard? or were you just running on test infra? |
closing since this was done in other PRs |
What
Add ssh tunnel support to Java JDBC connectors.
How
Uses apache mina sshd client to connect through ssh, set up a tunnel, and port forward the database ports to where the connector can use them. Tunnel is initiated at a high level of abstraction and closed when it's no longer needed by that particular workflow.
The spec.json configuration required is injected during json deserialization, adding the ssh tunnel config to both the source and the destination for jdbc databases. That means that this feature becomes available to all jdbc sources and destinations without providing connector-specific config screen changes. The user of course does have to fill in specific port forward values per actual connection they create.
Recommended reading order
SSHTunnel.java
AbstractJdbcSource.java
AbstractJdbcDestination.java
Relates to #312